Goto

Collaborating Authors

 yes-no question


LLM Robustness Against Misinformation in Biomedical Question Answering

Bondarenko, Alexander, Viehweger, Adrian

arXiv.org Artificial Intelligence

The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering by retrieving and providing additional context coming from external knowledge sources (e.g., by adding the context to the prompt). However, injecting incorrect information can mislead the LLM to generate an incorrect answer. In this paper, we evaluate the effectiveness and robustness of four LLMs against misinformation - Gemma 2, GPT-4o-mini, Llama~3.1, and Mixtral - in answering biomedical questions. We assess the answer accuracy on yes-no and free-form questions in three scenarios: vanilla LLM answers (no context is provided), "perfect" augmented generation (correct context is provided), and prompt-injection attacks (incorrect context is provided). Our results show that Llama 3.1 (70B parameters) achieves the highest accuracy in both vanilla (0.651) and "perfect" RAG (0.802) scenarios. However, the accuracy gap between the models almost disappears with "perfect" RAG, suggesting its potential to mitigate the LLM's size-related effectiveness differences. We further evaluate the ability of the LLMs to generate malicious context on one hand and the LLM's robustness against prompt-injection attacks on the other hand, using metrics such as attack success rate (ASR), accuracy under attack, and accuracy drop. As adversaries, we use the same four LLMs (Gemma 2, GPT-4o-mini, Llama 3.1, and Mixtral) to generate incorrect context that is injected in the target model's prompt. Interestingly, Llama is shown to be the most effective adversary, causing accuracy drops of up to 0.48 for vanilla answers and 0.63 for "perfect" RAG across target models. Our analysis reveals that robustness rankings vary depending on the evaluation measure, highlighting the complexity of assessing LLM resilience to adversarial attacks.


Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains

Wang, Zijie, Rashid, Farzana, Blanco, Eduardo

arXiv.org Artificial Intelligence

People often answer yes-no questions without explicitly saying yes, no, or similar polar keywords. Figuring out the meaning of indirect answers is challenging, even for large language models. In this paper, we investigate this problem working with dialogues from multiple domains. We present new benchmarks in three diverse domains: movie scripts, tennis interviews, and airline customer service. We present an approach grounded on distant supervision and blended training to quickly adapt to a new dialogue domain. Experimental results show that our approach is never detrimental and yields F1 improvements as high as 11-34%.


Interpreting Answers to Yes-No Questions in User-Generated Content

Mathur, Shivam, Park, Keun Hee, Chinnappa, Dhivya, Kotamraju, Saketh, Blanco, Eduardo

arXiv.org Artificial Intelligence

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and the few answers that include them are rarely to be interpreted what the keywords suggest. In this paper, we present a new corpus of 4,442 yes-no question-answer pairs from Twitter. We discuss linguistic characteristics of answers whose interpretation is yes or no, as well as answers whose interpretation is unknown. We show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media.


Interpreting Indirect Answers to Yes-No Questions in Multiple Languages

Wang, Zijie, Hossain, Md Mosharaf, Mathur, Shivam, Melo, Terry Cruz, Ozler, Kadir Bulut, Park, Keun Hee, Quintero, Jacob, Rezaei, MohammadHossein, Shakya, Shreya Nupur, Uddin, Md Nayem, Blanco, Eduardo

arXiv.org Artificial Intelligence

Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). Experimental results demonstrate that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).


Thread by @AnthropicAI on Thread Reader App – Thread Reader App

#artificialintelligence

Anthropic Dec 19 • 11 tweets • 5 min read Bookmark Save as PDF My Authors It's hard work to make evaluations for language models (LMs). We've developed an automated way to generate evaluations with LMs, significantly reducing the effort involved. We test LMs using 150 LM-written evaluations, uncovering novel LM behaviors. In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM). In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM).